Goto

Collaborating Authors

 neural attention


Latent Alignment and Variational Attention

Neural Information Processing Systems

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.


Latent Alignment and Variational Attention

Neural Information Processing Systems

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.


Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models

DiGiugno, Andrew, Mahmood, Ausif

arXiv.org Artificial Intelligence

Transformer models typically calculate attention matrices using dot products, which have limitations when capturing nonlinear relationships between embedding vectors. We propose Neural Attention, a technique that replaces dot products with feed-forward networks, enabling a more expressive representation of relationships between tokens. This approach modifies only the attention matrix calculation while preserving the matrix dimensions, making it easily adaptable to existing transformer-based architectures. We provide a detailed mathematical justification for why Neural Attention increases representational capacity and conduct controlled experiments to validate this claim. When comparing Neural Attention and Dot-Product Attention, NLP experiments on WikiText-103 show a reduction in perplexity of over 5 percent. Similarly, experiments on CIFAR-10 and CIFAR-100 show comparable improvements for image classification tasks. While Neural Attention introduces higher computational demands, we develop techniques to mitigate these challenges, ensuring practical usability without sacrificing the increased expressivity it provides. This work establishes Neural Attention as an effective means of enhancing the predictive capabilities of transformer models across a variety of applications.


Latent Alignment and Variational Attention

Deng, Yuntian, Kim, Yoon, Chiu, Justin, Guo, Demi, Rush, Alexander

Neural Information Processing Systems

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.


How to Generate Text from Images with Python

#artificialintelligence

In the Google Search: State of the Union last May, John Mueller and Martin Splitt spent about a fourth of the address to image-related topics. They announced a big list of improvements to Google Image Search and predicted that it would be a massive untapped opportunity for SEO. SEO Clarity, an SEO tool vendor, released a very interesting report around the same time. Among other findings, they found that more than a third of web search results include images. Images are important to search visitors not only because they are visually more attractive than text, but they also convey context instantly that would require a lot more time when reading text.


Neural Attention: Machine Learning Meets Neuroscience

#artificialintelligence

Neural attention has been applied successfully to a variety of different applications including natural language processing, vision, and memory. An attractive aspect of these neural models is their ability to extract relevant features from data, with minimal feature engineering.Brian Cheung is a PhD Student at UC Berkeley working with Professor Bruno Olshausen, as well as an Intern at Google Brain. By drawing inspiration from the fields of neuroscience and machine learning, he hopes to create systems which can solve complex vision tasks using attention and memory. At the Deep Learning Summit in Singapore, Brian will share expertise on the fovea as an emergent property of visual attention, ways we can extend this ability to learning interpretable structural features of the attention window itself, and finding conditions where these emergent properties are amplified or eliminated providing clues to their function. I asked him a few questions ahead of the summit to learn more.